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ABSTRACT 


A  series  of  computer  simulations  were  performed 
effects  of  item  response  theory  (IRT)  item  parameter 


in  order  to  observe  the 
'estimation  error  on 


decisions  made  using  an  IRT-based  sequentia’  probability  ratio  test.  Specifi- 

l 

cally,  the  effects  of  such  error  on  misclassification!  rates  and  the  average 

s 

number  of  items  required  for  either  a  mastery  (nass)  jor  nonmastery  (fail) 
decision  were  observed  under  varied  SPRT  conditions,  j  These  conditions  includ¬ 
ed  the  a  priori  or  nominal  type  I  (a)  and  type  II  (B)i  error  rates,  the  simple 

i 

hypotheses  tested  by  the  SPRT  procedure,  and  the  composition  of  the  item  pool 

i 

(specifically  the  a,  b  and  c  parameters  which  characterized  the  items  accord- 

♦ 

ing  to  a  three-parameter  logistic  IRT  model)  used  to  administer  the  SPRT.  The 
results  of  these  simulations  showed  that  these  SPRT  decisions  are  not  greatly 
affected  by  this  particular  level  of  error  in  parameter  estimates  modeled  in 
this  study.  Misclassification  error  rates  were  slighjtly  lower  and  average 

i 

numbers  of  items  required  for  a  decision  were  slightly  greater  when  estimation 

error  in  the  item  parameters  was  present,  but  such  differences  appear  to  be 

i 

negligible. 
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The  Effect  of  Item  Parameter  Estimation  Error  on  Decisions 
Made  Using  the  Sequential  Probability  Ratio  Test 

Wald's  (1947)  sequential  probability  ratio  testing  (SPRT)  procedure  has 
been  proposed  as  a  technique  for  making  pass-fail  or  raastery-nonmastery 
decisions  in  adaptive  testing  situations  (Reckase,  1983).  The  SPRT  was 
originally  proposed  by  Wald  in  order  to  decide  between  two  simple  hypotheses, 

HQ  and  H ( ,  or 

■  H0:  9  =  0O 

vs. 

H,  8  9  =  8j, 

where  8  is  an  unknown  parameter  of  the  distribution  of  some  random  variable, 

X.  In  a  cognitive  testing  situation,  the  random  variabLe,  X,  is  the  response 
to  a  test  item  and  is  usually  assumed  to  be  a  dichotomous  response,  correct  or 
incorrect . 

In  the  case  of  cognitive  testing,  the  random  variable,  X,  is  assumed  to 
follow  a  binomial  distribution.  If  P(8^)  is  the  probability  that  examinee  i_ 
will  respond  correctly  to  any  item  and  Q(9^)  =  1  -  P(fK)  is  the  probability  of 
an  incorrect  response  from  examinee  i,  then  (for  any  single  item)  the  random  varia 
ble,  X,  represents  a  single  Bernoulli  trial  and  is  distributed  as 
bin(P(9^),  1],  Then,  let 

7i(0.)  =  Prob  (X  =  x  1 6  =  9.)  =  P(9.)X  Q(«.)1”X 
i  - 1  i  i  i 

where 

I,  correct  response 
0,  incorrect  response  . 


x 


For  any  single  item,  the  probability  of  observing  X  =  x  under  the  alter¬ 
native  hypothesis  is  tt( 9 1 ) .  Under  the  null  hypothesis,  this  probability 
is  -(30).  The  functions,  tt( 9 , )  and  x(0o)  are  called  likelihood  functions  of 
x.  A  ratio  of  these  two  functions,  L(x)  =  ir( 91)/tt(90),  is  called  a  likelihood 
ratio. 

Two  error  probabilities,  a  and  S,  can  be  defined,  where 


and 


Prob  (choosing  H j | HQ  is  true)  =  a 


Prob  (choosing  HQ j H t  is  true)  =  0  . 


Wald  (1947)  defined  two  likelihood  ratio  boundaries  using  inequalities  which 
involved  these  error  probabilities.  These  boundaries  are  A  and  B  where 


and 


lower  boundary  =  B  >  B/(l-a) 


upper  boundary  =  A  <  (l-s)/o  . 


According  to  Wald's  SPRT,  trials  or  items  would  be  observed  in  sequence, 

x.,  x,,  ...,  x  ,  and  following  each  observation,  the  likelihood  ratio, 

—  *  — L  -n 

L(xj,  x2,  ...,  *n),  would  be  computed,  where 

x  j  (  9  j )  •  w  2  ( 9 1  >  "* 

L(xt,  x2,  ...,  xn)  =  xj(0o)  •  *2(0O)  /(90>  ’ 

The  likelihood  function  then  would  be  compared  to  the  boundaries,  A  and  B.  If 


•«  lt~* 


nmJ' 


/ 

/ 


»IWV^  Win  *  «  tfl •  ^ 


L(*i »  *2.  •••*  xn)  -  A» 

than  Hj  is  accepted.  If 

L(x . ,  x, ,  • • • ,  x  )  <  B, 

—  t  —  c  -n 

then  Hq  is  accepted.  If 

B  <  L(^’  52*  -n^  '  A* 

then  another  trial  is  observed,  or  in  the  case  of  cognitive  testing,  another 
item  is  administered. 

Once  a.  S  and  cne  hypotheses  are  set  prior  to  testing,  the  stopping  rules 
of  the  test  (i.e.,  the  boundaries)  are  defined.  Although  a  and  S  are  deter- 

I 

mined  prior  to  observing  x,  where  x  =  (x,  x,  •••  x  ),  Wald  (1947)  pointed  out 

ic  * 

that  the  actual  error  rates  observed  in  practice,  a  and  S  ,  would  be  bounded 
from  above  by 


a  <  a/(l-8) 


and 


l  <  B/(l-o) 

(see  Wald,  1947,  p.  46).  This  means  that  even  though  the  nominal  error 
probabilities,  a  and  S,  are  established  prior  to  testing,  the  actual  error 
rates  can  be  less  than  these  nominal  rates,  or  even  greater  than  the  nominal 


rates 


rn  intitt-  ICTM*  V*  UX  W  UVJ*  U-V<<V  LVLMWIMJV  \  «WV,Vl  V\  JVl  A.~l  .V  K"!  VtKAMAX£  KAUJULl  «fl  MJ!  X.1  it '  V,1!  It  .'Tit  V*.X\  <"J  VC  W  V. 
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Reckase  (1983)  reported  the  results  of  computer  simulation  research  of 
the  SPRT  procedure  as  it  applied  to  tailored  or  computerized  adaptive  testing 
(CAT)  for  making  mastery  testing  decisions.  He  noted  that  this  research  had 
three  purposes:  (1)  to  obtain  information  on  how  the  SPRT  procedure  func¬ 
tioned  when  items  were  selected  from  the  item  pools  on  the  basis  of  maximizing 
item  information  rather  than  on  the  basis  of  a  simple  random  sampling  proce¬ 
dure;  (2)  to  gain  experience  in  selecting  values  of  80  and  8j,  assumed  to  be 
the  two  critical  values  of  ability  required  to  be  classified  as  nonmaster  or 
master,  respectively;  and  (3)  to  obtain  information  on  the  effects  of  guessing 
on  the  accuracy  of  classification  when  the  form  of  P(8)  was  the  one-parameter 
logistic  IRT  (item  response  theo-y)  model  but  a  three-parameter  logistic  model 
was  used  to  determine  the  responses. 

Reckase's  first  concern,  (1)  above,  was  that,  in  a  given  pool  of  test 

items,  only  a  small  portion  of  these  items  would  be  available  for  selection 

for  a  given  examinee  and  that  the  selection  of  test  items  would  be  based  on 

estimates  of  0  after  the  administration  of,  say  n  items.  This  is  because  the 

selection  of  the  n+lst  item  is  dependent  upon  maximum  item  information  at 

8  ,  max  1(8)  ,  where 
n  n 

p'(8  ) 

I(9  )  =  - — - V  , 

n  P(e  )Q(e  ) 
n  n 

and  P  (0  )  is  the  derivative  of  P(8)  w.r.t.  8,  evaluated  at  0  . 
n  n 

It  would  appear  that  this  nonrandom  selection  process  would  not  really  be 
problem  because  the  stopping  rule  of  the  SPRT  is  determined  by  prior  knowledge 
of  a,  B,  8q  and  8t  before  the  test  even  begins  and  because  L(x,  x,,  ...,  x^)  i 
written  as  the  product  of  the  individual  item  likelihood  ratios  through  the 
assumption  of  local  independence  of  the  x,  given  8^. 

However,  a  problem  may  occur  when  it  is  time  to  generalize  the  results  of 


Che  mastery/nonmastery  decision-making  process,  as  defined  by  the  SPRT.  In 

most  mastery  situations,  it  is  desirable  to  generalize  the  results  of  a 

mastery  test  to  the  entire  domain  of  objectives  measured  by  t>-  *  st,  and  this 

domain  is  usually  represented  by  the  entire  item  pool.  If,  however,  items  are 

selected  on  the  basis  of  max  1(9^),  then  inferences  made  to  the  entire  pool  of 

items  may  be  questionable.  On  the  other  hand,  one  could  always  claim  that  the 

inferences  are  actually  being  made  or  generalized  to  the  ability  level  or  the 

latent  trait  value  (call  it  0  )  required  before  an  individual  examinee  can 

c 

pass  the  criterion  number  of  items  in  the  item  pool,  ir(0  ). 

Perhaps  a  more  serious  concern  is  the  effect  of  assuming  that  the  function, 
P(0^),  is  only  a  function  of  9^,  and  known  item  parameters.  For  the  IHT  models 
which  would  be  assumed  to  define  P( 0 ^ )  explicitly,  the  item  parameters  are  usually 
treated  as  known  values  in  CAT  administrations.  The  item  pool  contains  values  of 
these  item  parameters  so  that  L(xlf  x2,  ...,  jO  and  I(0n)  can  be  computed  during 
the  test.  However,  these  values  are,  themselves,  estimates  of  the  true  but  unknown 
item  parameters.  The  estimates  have  been  obtained  in  calibration  computer  runs 
prior  to  the  CAT  administrations  and  are  stored  along  with  the  actual  items  in  the 
pool. 

The  present  computer  simulation  study  was  designed  to  investigate  the 
effects  of  item  parameter  estimation  error  on  the  characteristics  of  the  SPRT 
procedure.  In  this  first  phase  of  a  thorough  investigation,  a  strict  SPRT  was 
administered,  meaning  that  the  test  was  not  adaptive  (i.e.,  9  was  not  estimat¬ 
ed  and  items  were  not  selected  for  administration  based  on  max  1(0} ). 

The  research  question  to  be  answered  by  these  simulations  was,  "What  are  the 

'it  ’ ft 

effects  on  observed  type  I  (a  )  and  type  II  (8  )  error  rates  when  an  SPRT  is 
administered  from  item  pools  which  contain  items  whose  parameters  are  esti¬ 
mates  rather  than  known  values?"  A  secondary  interest  was  to  observe  the 


effects  of  these  conditions  on  the  average  number  of  test  items  required  to 
make  a  classification  decision  at  each  value  of  9  (particularly  at  @0 
and  9 j ) »  This  number,  called  the  average  sample  number  (ASN)  is  a  function  of 
the  stopping  rule  of  the  tests  (i.e.,  ir.  is  a  function  of  a,  8,  80  and  9,).  . 

Method 

Two  hundred  eighty-eight  computer  simulations  were  completed  on  either  an 
IBM  PC  or  XT.  These  288  simulations  represented  one  combination  of  conditions 
from  a2X4X3X3X4  completely  crossed  design.  Each  of  these  runs  consisted 
of  1000  replications  of  an  SPRT  administered  to  all  of  24  hypothetical  examinees 
with  ability,  9^,  ranging  from  -3.0  to  ♦•3.0,  incremented  by  .25. 

The  research  design  conditions  were  (1)  an  estimation  error  condition, 

(2)  composition  of  the  item  pools,  (3)  a  priori  type  I  errr  rate  (a),  (4)  a 
priori  type  II  error  rate  (0),  and  (5)  hypotheses.  It  was  assumed  that  the  item 
pools  contained  items  which  interacted  with  each  examinee  according  to  a  three- 
parameter  logistic  model  (3-PLM)  to  produce  a  correct  or  incorrect  response  to 
each  item. 

Conditions 

Estimation  error.  There  were  two  leveLs  of  the  estimation  error  condi- 
tion,  absent  (El)  or  present  (E2).  Under  the  absent  level  (El),  the  item 
parameters  from  the  items  in  the  pools  were  considered  to  be  known  values,  and 
each  of  the  24  hypothetical  examinees  in  the  similations  with  ability,  0^, 
responded  to  the  items  in  the  pool  by  comparing  a  deviate  »rom  a  uniform 
distribution  on  the  open  interval,  0  to  1,  with  the  P(8^)  function  given  by 


the  3-PLM,  abbreviated  as  P^. 


■Vb»' 


»,r 


Under  Che  presen1:  level,  it  was  assumed  Chat  the  item  parameters  were  actually 
estimates  derived  from  previous  maximum  likelihood  estimation  (MLE)  calibrations 
on  2500  examinees  with  ability,  9,  distributed  as  normal  with  mean  zero  and 
variance  one.  According  to  the  notation  used  by  Thissen  and  Wainer  (1982), 
che  maximum  likelihood  estimates  of  the  set  of  item  parameters,  C,  are  those 
that  are  located  where  the  partial  derivatives  of  the  log  of  the  likelihood 
function,  summed  over  N  examinees,  are  zero.  If  1  is  this  sum,  or 


1  =  i  x  Log  (P.)  +  (1  -  x)  log  (1  -  P.), 
1=1  "  1  "  1 


then,  again  from  Thissen  and  Wainer  (1982)  but  written  without  the  subscript, 
these  MLEs  satisiv 


32,  _  _  x  3P  _  (1  -  x)  3P  _ 

3£  PH  (1  -  P>  3£ 


The  inverse  of  the  negative  expected  value  of  the  matrix  of  second 

derivatives  of  the  function,  2,  is  the  asymptotic  variance-covariance  matrix 

of  the  estimates,  C,  obtained  from  the  relationship  given  by  (1).  If  the 

2 

second  partial  derivatives  of  2  are  written,  in  general,  as  3  2/ 3£  3E  ,  for 

s  t 

any  parameters,  and  then 


-  N (i  If-  ^  ♦  xrHy  fjk  a.)  .<•>  «. 


where  ( 0 )  is  taken  to  be  a  normal  density  with  zero  mean  and  variance  one 
(Thissen  &  Wainer,  1932).  In  other  words,  if  Z  is  the  variance-covariance 
matrix  of  £,  then  Z  is  defined  by  the  inverse  of  the  matrix  whose  eLements 
are  given  by  (2). 


8 


V- 

r 


,  ,  ,  1 

For  the  present  level  (E2)  of  the  estimation  error  condition,  it  was  i; 

f. 

assumed  that  the  item  parameters  were  actually  estimates  sampled  from  a 
multivariate  normal  distribution  with  mean  vector  £  and  variance-covariance 
matrix  E,  where  5  was  given  for  the  item  pool  used  for  a  particular  SPRT 
and  E  was  computed  from  (2). 

Item  Pools.  There  were  four  types  of  item  pools  used  in  the  simulations.  , 

The  first  three  consisted  of  500  identical  items  from  a  three-parameter  logistic 
IRT  model  of  the  form, 


P(0. )  =  c  + 

l 


(1  -  c) _ 

1  +  exp  {— 1.7a(0£  -  b)} 


(3) 


For  the  first  pool  (_I1),  a  =  1,  b  =  0,  and  c  =  0  for  all  500  items.  Under  the 
El  condition,  these  identical  items  represented  a  simple  SPRT  with  constant 
success  probability,  P ( 9 ^ )  for  a  given  (K  value.  Under  the  E2  condition,  the 
items  were  still  administered  in  sequence  but  were  no  longer  identical  because 
each  item  represented  a  different  set  of  item  parameter  estimates.  For  example, 
even  though  at  =  a2  =  ...  =  a50Q,  each  a  parameter  represented  an  estimate, 
a.,  where 


a .  =  a  +  e  .  , 
-J  §J 


and  e  .  was  a  random  deviate  from  a  multivariate  normal  distribution  with  mean 
§J 

vector  0  and  variance-covariance  matrix  E,  defined  previously. 

For  the  second  item  pool  (12),  a  =  1,  b  =  0,  and  c  =  .2.  For  the  third 

pool  (13),  a  =  1.5,  b  =  0,  and  c  =  .2.  Again,  under  El  these  item  parameters 

remained  constant  for  all  500  items  in  a  pool.  However,  under  E2 ,  item  parameter 

values  were  assumed  to  be  estimates  (a  +  e  b  +  e,  and  c  +  c  .  with  e  e  .. 

aj  -  bj’  "  '  cj  '  -aj’  bj’ 


and  t  .  being  random  deviates  as  before), 
ci 


Best  Available  Copy 


For  the  fourth  item  pool  (14),  the  500  sets  of 


parameters  were  generated 


from  a  pseudo-random  number  generator  with  a  -  U(.5,  2.5),  b  -  3.) 

and  c  ~  U( .0,  .2).  This  was  caLled  the  random  item  pool. 

Error  Rate  Conditions.  Type  I  or  a  rates  were  .01  (Al),  .05  (A2),  and  .10 
(A3).  Type  II  or  S  rates  were  also  .01  (Bl),  .05  (B2),  and  .10  (B3). 

Hypotheses .  In  a  mastery  testing  situation,  the  usual  practice  is  to  es¬ 
tablish  a  single  cutoff  point  along  the  ability  scale,  8^,  which  corresponds  to  a 
minimum  proportion  of  items  in  the  domain,  7t(9^),  that  an  examinee  is  expected  to 
answer  correctly  in  order  to  be  classified  as  a  master.  The  relationship  be¬ 
tween  9^  and  rr(9^),  for  example,  might  be 

£  £  p.(a  )  =  *(e  ), 

n  j=1  J  c 

where  n  ir  the  number  of  items  in  the  pool  representing  this  testing  domain. 

Because  the  SPRT  procedure  requires  the  setting  of  two  values  of  9  in  a  simple 

hypothesis  configuration,  one  usually  sets  90  <  0c  <  8t.  The  region  between 

0O  and  8,  is  referred  to  as  an  indifference  region.  Reckase  ((1983)  stated 

that  in  order  to  use  the  SPRT,  a  region  must  be  specified  around  9  for  which  it 

c 

does  not  matter  whether  a  pass  or  a  fail  decision  is  made.  If  high  accuracy  is 

desired  for  the  decision  rule,  a  narrow  indifference  region  must  be  specified, 

but  more  items  wi'l  be  required  to  make  the  decision.  As  the  region  gets  wider, 

the  decision  accuracy  declines,  but  fewer  items  are  required”  (p.  243). 

In  the  present  study,  four  simple  hypotheses  were  used  to  establish  four 

sizes  of  indifference  regions  around  the  chosen  value  of  9  »  .00.  These  sets  of 

c 

hypotheses  (0O,  8,)  were  (1)  HI:  (-.25,  .25),  (2)  H2:  (-.5,  .5),  (3)  H3:  (-.75, 

.75),  and  (4)  H4:  (-1,0,  1.0). 


«  •%.  T. 
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Results 

The  results  of  these  288  computer  simulations  focused  on  the  effects  of 
the  E2  condition  on  four  characteristics  or  measures  of  an  SPRT:  actual  or 
observed  a  rate  (a  )  ,  actual  or  observed  B  rate  (8  )  ,  average  sample  number 
or  ASN  when  0  =  0Q,  and  ASN  when  0=8^  These  results  are  given  in  Tables  1 
through  6  in  terms  of  overall  and  marginal  means  and  standard  deviations  of  these 
variables  under  the  El  and  E2  conditions. 

Actual  Error  Rates 


Table  1  shows  that  even  though  a  nominal  type  I  error  or  a  rate  was  estab- 

lished  prior  to  the  usual  SPRT,  the  observed  rate  (a  )  was  actually  lower  than 

* 

the  nominal  one.  Under  the  El  condition,  a  was  .007,  .034,  and  .060,  for  Al, 

A2,  and  A3  nominal  rates,  respectively.  Under  the  E2  condition,  these  observed 
a  rates  were  lower  still,  .005,  .030,  and  .065,  for  Al,  A2,  and  A3.  However, 
the  overall  decrease  in  a  for  E2  (i.e.,  from  .036  to  .033)  was  quite  small  and 

probably  insignificant  from  a  practical  standpoint. 

.  .  '•* 

There  was  a  relatively  large  decrease  in  overall  mean  a  under  E2  for  the 

it 

fourth  hypothesis,  H4 ,  where  the  mean  a  =  .027  (see  Table  1).  A  further  analysis 

*  ... 
of  a  by  the  nominal  error  rates,  Al,  A2,  and  A3  for  this  E2-H4  combination 

it 

revealed  that  all  three  values  of  a  were  lower  for  H4,  although  these  values 
were  usually  lower  for  each  hypothesis  under  E2,  regardless  of  the  nominal 
a  level . 

The  two  exceptions,  as  seen  in  Table  2,  are  at  the  A3  level.  No  reasons 

it 

for  these  lower  a  were  apparent  from  inspection  of  further  analyses  within 
the  design. 
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Table  3  shows  that  Che  observed  6  races  (8  )  were  affected  even  less  under 
the  E2  condition  than  the  a  races.  Although  8  was  usually  smaller  under  E2 
versus  El,  this  difference  was  never  g-eater  Chan  .002.  However,  there  was  a 
relatively  large  decrease  in  8  under  the  J_4  condition  for  both  El  and  E2.  Table 
4  shows  that  the  S  rate  was  lower  under  all  nominal  8  races  when  the  item  pool 
consisted  of  items  with  variable  item  parameter  values  (either  known  or  estimated). 

Average  Sample  Numbers 

The  overall  effect  of  E2  on  average  sample  number  (ASN)  was  to  increase  the 
number  of  test  items  required  to  make  a  classification  decision  at  each  9  level 
for  which  the  ASN  was  analyzed.  Table  5  shows  that  when  9  =  9^  this  overall 
increase  in  ASN  amounted  to  1.1  items  from  El  to  E2.  The  greatest  increase 
occurred  under  the  HI  condition  (42.5  to  46.8). 

Table  6  shows  that  when  9  *  90,  the  increase  in  ASN  from  El  to  E2  was  even 
smaller  (.8).  Again,  the  greatest  increase  occurred  under  the  HI  condition  (41.5 
to  44.2). 

It  was  interesting  to  note  the  effects  of  different  item  pools  on  the  ASN. 
Tables  5  and  6  show  that,  regardless  of  the  estimation  error  condition,  the  ASN 
increased  when  items  within  the  pool  included  a  nonzero  value  for  c,  the  pseudo¬ 
guessing  parameter.  When  items  became  more  discriminating  (i.e.,  when  the  dis¬ 
crimination  or  a  parameter  changed  from  1.0  to  1.5),  a  decrease  in  ASN  was 
noted.  However,  when  items  had  variable  item  parameters,  as  was  the  case  under 
the  14  or  random  item  pool  condition,  the  ASN  increased  significantly.  The 
observed  effects  on  the  ASN  under  the  fixed  item  pools,  II,  12,  and  13,  are  more 
easily  understood  when  the  hypotheses  and  the  indifference  regions  are  trans¬ 
formed  into  functions  of  8Q  and  9[(  namely  n(0Q)  and  0 1 > .  Because  all  of  the 
items  in  these  pools  are  identical, 


I 

\ 


c 


l 


r. 
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n 

V 
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(a  \  -  _ C  ■*•  ( 1  ~  C ) _  _ 

11  0  I  +  exp  {-1.7a(90  -  b)}  7,0 


/Q  X  _ c  +  (1  -  c) _  _ 

n  "  1  +  exp  {-1.7a(9l  -  b)}  *  * 


Table  7  shows  these  transformed  hypotheses  and  indifference  region  lengths 
in  terms  of  rr  ( 8  Q )  and  it  ( 0  t ) .  Wald's  SPRT  theory  predicts  that  the  ASN  for  any 
vaLue  of  0  will  increase  as  the  size  of  the  indifference  region  decreases. 
Therefore,  it  is  no  surprise  that,  of  the  three  fixed  pools,  the  12  pool  produced 
the  highest  ASN  at  0Q  and  dt  while  13  showed  the  smallest  overall  ASN  values. 

For  the  random  item  pool,  and  i  i  in  Table  7  were  defined  in  terms  of  the  aver¬ 
ages,  x0  and  irL,  across  the  500  sets  of  item  parameters  in  14,  or 

soo 

"0  =  —  l  c.  +  (1  -  c.)/[l  ♦  exp{-1.7a.  (0Q  -  b  )}  J 

SOO  j=l  j  j  j  J 

and 

500 

x.  =  -  t  c.  ♦  (1  -  c.)/[l  *  exp{-l.7a.  (01  -  b.)}] 

soo  j=1  J  J  J  J 

The  smaller  average  indifference  regions  encountered  for  14  would  appear  to 
account  for  larger  ASN  values  for  1.4  in  Tables  5  and  6. 

Other  changes  in  ASN  under  the  various  error  rate  and  hypothesis  conditions 
were  again  predicted  by  Wald's  SPRT  theory.  For  example,  ASN  is  expected  to  de¬ 
crease  as  a  or  8  increases  and  as  the  indifference  region  around  0c  increases. 
Tables  5  and  6  show  that  this  did  occur  under  El  and  E2. 


*/•  1 4  «  *  »  p  %«  ajt «  *  a  •  m  *  *  %  tt  «  »  m  a  ■  a  •  k  •  %  *  n  w  n  w  »  w  «  -  »  w  w*  *%■  v-  *  v  w\.  wv  «r\ 
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Summary  and  Conclusions 


I,;,, 


¥■ 


Administering  a  test,  using  Wald's  sequential  probability  ratio  testing 
procedure  on  item  pools  which  contain  IRT  parameter  estimates  rather  than  known 
values  did  not  appear  to  have  much  effect  on  observed  mastery  or  nonmastery 

classification  error  rates.  These  observed  error  rates  were  smaller  when  it  was 

1 

1 

assumed  that  the  item  parameters  were  actually  MLEs  based  on  prior  calibrations 

I 

involving  examinees  with  known  abilities.  However,  these  smaller  observed  error 
rates  were  not  appreciably  different  from  the  absent-error  condition,  El.  Ob¬ 
served  error  rates  under  both  estimation  error  conditions  were  still  smaller  than 
the  nominal  rates  established  prior  to  testing  and  this  would  appear  to  be  the 
most  important  finding  regarding  eirror  rates. 

It  should  be  pointed  out  that  the  amount  of  error  in  the  item  parameters  was 
based  on  several  assumptions.  First,  it  was  assumed  that,  during  the  item  cali- 

i 

brations,  ability  was  known.  This  is  rarely  true  because  ability  aLmost  always 
must  be  estimated  in  practice.  Estimation  of  ability  would  increase  the  amount 

of  error  in  the  item  parameter  estimates,  thereby  magnifying  the  effects  of 

i 

estimation  on  the  SPRT  results.  Second,  the  errors  were  derived  under  the 
assumption  of  normality  for  the  (unidimensional)  ability  distribution.  And 
finally  these  error  estimates  were  based  on  asymptotic  standard  error  formulae 
and  large  sample  sizes  of  items  and  examinees  were  assumed. 

The  estimation  error  condition  did  appear  to  have  some  effect  an  the  ob¬ 
served  a  rate  when  the  largest  indifference  region  was  simulated  (H4).  How 
important  this  effect  is  in  practice  remains  to  be  seen  because  the  simulations 

k  #  k 

still  produced  an  a  rate  less  than  the  nominal  average  and  because  this  a  rate 
occurred  with  an  indifference  region  (-1.0,  1.0)  which  may  be  too  large  to  be 
useful  in  actual  SPRT  administrations. 
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One  noticeable  finding  involving  8  was  the  amount  of  decrease  in  this  error 
rate,  regardless  of  the  estimation  error  condition,  when  the  nature  of  the  item 
pool  changed  in  terms  of  item  parameters.  Wald's  SPRT  theory  makes  use  of  the 
local  independence  assumption  of  IRT  through  the  formulation  of  the  likelihood 
functions  under  HQ  and  H1  as  products  of  probabilities.  There  is  nothing  in  the 
SPRT  theory  which  requires  that  these  probabilities  be  constant  from  item  to  item 
within  the  pool.  And  yet,  from  Table  3,  it  is  obvious  that  when  these  probabil- 

ie 

ities  varied  considerably  from  item  to  item  (14),  8  was  significantly  smaller 
than  when  the  items  did  not  vary  at  all  (II,  ][ 2  and  13  under  El)  or  varied  by  a 
very  small  amount  (II  t  12,  and  13  under  E2).  A  similar  effect  on  a  was  not 
observed. 

On  the  other  hand,  the  ASN  was  much  larger  under  the  14  item  pool  condition, 
thereby  leading  to  the  following  conclusion.  When  items  are  administered  via 

SPRT  procedures  and  those  items  vary  considerably  in  for  a  given  examinee, 

* 

then  the  ASN  will  be  larger  and  the  8  rate  smaller  than  for  SPRT  item  pools  in 
which  the  variability  of  P.  is  smaller. 

The  estimation  error  condition  did  yield  higher  ASN  values  at  all  true 
0  values,  in  general,  but  these  increases  did  not  appear  to  be  significant  with 
the  item  parameter  estimation  error  used  in  these  simulations.  According  to  SPRT 
theory,  the  ASN  of  any  SPRT  will  be  a  maximum  for  some  0  value  within  the  indif¬ 
ference  region,  (0O,  9j).  The  rather  large  values  of  ASN  for  the  HI  condition, 
regardless  of  estimation  error,  suggest  that  this  hypothesis  could  yield  ASN 
values  greater  than  50  items  for  some  examinees  with  8  between  -.25  and  .25. 
Therefore,  HI  may  be  an  impractical  hypothesis  to  consider  for  actual  SPRT 
administrations  due  to  the  increased  test  length.  Hypothesis  H2  or  H3  may  be 
more  reasonable  in  practice. 

When  items  from  item  pools  are  chosen  on  some  nonrandom  basis  (e.g.,  select- 

A  A 

ing  items  which  maximize  1(0^)  on  the  basis  of  estimates  of  ability,  0  ),  the 
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variability  of  for  a  given  examinee  may  be  minimal,  and  the  effects  of  using 
SPRT  in  a  CAT  situation,  for  example,  are  not  expected  to  change  the  characteris¬ 
tics  of  the  test  from  those  predicted  by  the  SPRT  theory,  even  when  item  parame¬ 
ter  estimates  are  used.  In  fact,  when  administered  as  an  SPRT,  the  CAT  may  even 
require  fewer  items  and  yield  smaller  classification  errors  when  items  are  se¬ 
lected  for  administration  on  the  basis  of  maximum  information.  Therefore,  a 
second  phase  of  this  research  will  examine  the  characteristics  of  an  SPRT  when 
items  are  administered  randomly  from  14  versus  when  the  items  are  administered  on 
the  basis  of  max  1(0),  with  0  known.  A  third  study  will  compare  the  results  of 
the  max  1(0)  procedure  of  item  selection  versus  a  max  1(0^)  procedure,  where  0  is 
unknown  and  must  be  estimated  after  each  item  is  presented. 
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TABLE  1 


Actual  Alpha  Rate  (a  ) 


N 

Est imacion 
El 

Absent 

Error 

E2 

Present 

Overall 

Mean 

144 

.036  (0.26) 

.033  (.027) 

36 

11 

.034  (.026) 

.031  (.027) 

Item 

36 

12 

.039  (.028) 

.036  (.027) 

Pool 

Means 

36 

13 

.033  (.026) 

.033  (.028) 

36 

14 

.037  (.027) 

.033  (.026) 

48 

A1  (.01) 

.007  (.002) 

.005  (.002) 

a  Rate 

Means 

48 

A2  (.05) 

.034  (.008) 

.030  (.009) 

48 

A3  (.10) 

.067  (.014) 

.065  (.015) 

48 

B1  (.01) 

.036  (.027) 

.033  (.027) 

8  Rate 

Means 

48 

B2  (.05) 

.036  (.027) 

.033  (.027) 

48 

B3  (.10) 

.036  (.026) 

.  .034  (.027) 

36 

HI  (  +  .25) 

.039  (.028) 

.037  (.029) 

Hypothesis 

36 

H2  (±  .50) 

.039  (.027) 

.038  (.027) 

Means 

36 

H3  (  +  .75) 

.032  (.025) 

.032  (.027) 

36 

H4  (+1.00) 

.034  (.027) 

.027  (.023) 

Note:  Standard  deviations  are  >>iven  in  parentheses  in  columns  6  and  8. 


I 


TABLE  2 

Actual  Alpha  Rate  (a  )  Means  and  Standard  Deviations  by  Hypothesis 


Estimation  Error 


Absent 


Present 


007 

(.002) 

.004 

(.001) 

038 

(.007) 

.035 

(.007) 

073 

(.006) 

.072 

(.007) 

,008 

(.002) 

.007 

(.001) 

.038 

(.006) 

.035 

(.008) 

.070 

(.009) 

.071 

(.008) 

.005 

(.002) 

.004 

(.001) 

.029 

(.006) 

.027 

(.008) 

.061 

(.014) 

.065 

(.015) 

.006 

(.003) 

.004 

(.002) 

.032 

(.009) 

.024 

(.006) 

.063 

(.021) 

.052 

(.019) 
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TABLE  3 

,  *x 

Actual  Beta  Rate  (B  ) 


■ _ Estimation  Error _ 

N  El  E2 

Absent  Present 


Overall 


Mean 

144 

.032 

(.025) 

.031 

(.026) 

36 

11 

.036 

(.027) 

.035 

(.027) 

Item 

36 

12 

.037 

(.027) 

.035. 

(.028) 

Pool 

Means 

36 

13 

.032 

(.025) 

.033 

(.028) 

36 

14 

.023 

(.020) 

.022 

(.021) 

48 

A1 

(.01) 

.032 

(.025) 

.030 

(.026) 

a  Rate 

Means 

48 

A2 

(.05) 

.032 

(.025) 

.032 

(.027) 

48 

A3 

(.10) 

.032 

(.026) 

.031 

(.027) 

•  .  1  *  \ 

48 

B 1 

(.01) 

.007 

(.003) 

.006 

(.002) 

B  Rate 

Means 

48 

B2 

(.05) 

.030 

(.011) 

.028 

(.012) 

48 

B3 

(.10) 

.060 

(.019) 

.060 

(.021) 

36 

Hi 

(±  .25) 

.041 

(.027) 

.039 

(.030) 

Hypothesis 

36 

H2 

(±  .50) 

.036 

(.028) 

.034 

(.026) 

Means 

36 

H3 

(±  .75) 

.027 

(.022) 

.027 

(.023) 

36 

H4 

(±1.00) 

sr 

C4 

o 

(.020) 

.025 

(.023) 

Note:  Standard 

deviat ions 

are  given 

in  parentheses  in  columns 

6  and  8. 
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TABLE  4 

Actual  Beta  Rate  (3  )  Means  and  Standard  Deviations  by  Item  Pool 


>» 

■AS- 


'■•S' 


,v 

y 

■» * 
*.v 
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TABLE  5 
ASM  (Hj) 


_ Estimation  Error _ 

N  El  E2 

Absent  Present 


Overal l 


Mean 

144 

17.6 

(19.6) 

18.7 

(20.9) 

36 

11 

13.5 

(14.3) 

13.8 

(14.7) 

I  tem 

36 

12 

16.7 

(16.8) 

20.0 

(20.5) 

Pool 

Means 

36 

13 

10.2 

(  9.6) 

10.4 

C  9.9) 

' 

36 

14 

30.0 

(27.6) 

30.5 

(28.6) 

48 

A1 

(.01) 

22.8 

(25.4) 

25.  * 

(27.5) 

o  Rate 

Means 

48 

A2 

(.05) 

16.9 

(17.2) 

17.1 

(17.8) 

48 

A3 

(.60) 

13.1 

(13.4) 

13.4 

(13.8) 

-• 

48 

B1 

(.01) 

18.4 

(20.6) 

20.0 

(22.6) 

S  Rate 

Means 

48 

B2 

(.05) 

17.1 

(19.1) 

19.0 

(21.7) 

v  - 

48 

B3 

(.10) 

17.3 

(19.4) 

17.0 

(18.7) 

ri  j :  ■•= 

36 

HI 

(±.25) 

42.5 

(24.2 

46.8 

(24.1) 

Hypothesis 

36 

H2 

(±.50) 

14.4 

(  7.2) 

14.3 

(  7.1) 

Means 

36 

H3 

(±.75) 

8.2 

(  .1) 

8.2 

(  4.9) 

36 

H4 

(±1.00) 

5.3 

(  3.3) 

5.5 

(  3.3) 

Note : .  St Hard 

deviat ions 

are  given  in  parentheses  ir 

i  columns 

6  and  8. 

TABLE  6 


N 

Estimation 

El 

Absent 

Error 

E2 

Present 

Overall 

Mean 

144 

16.2 

(19.1) 

17.0 

(19.7) 

36 

11 

13.6 

(14.6) 

13.4 

(14.0) 

Item 

36 

12 

16.2 

(18.3) 

19.3 

(20.9) 

Pool 

Means 

36 

13 

9.4 

(  9.5) 

9.4 

(  9.4) 

36 

14 

25.6 

(26.6) 

25.9 

(26.5) 

48 

A1 

(.01) 

15.7 

(19.1) 

18.1 

(21.2) 

a  Rate 

Means 

48 

A2 

(.05) 

17.0 

(20.1) 

17.0 

(19.8) 

48 

A3 

(.10) 

15.9 

(18.6) 

15.9 

(18.3) 

48 

B1 

(.01) 

21.8 

(25.6) 

23.2 

(26.4) 

8  Rate 

Means 

48 

B2 

(.05) 

14.6 

(15.9) 

15.5 

(16.2) 

48 

B3 

(.10) 

12.2 

(12.5) 

12.3 

(12.7) 

36 

HI 

(±.75 

41.5 

(23.3) 

44.2 

(22.0) 

Hypothesis 

36 

H2 

(±.50) 

12.4 

(  5.5) 

12.8 

(  5.9) 

Means 

36 

H3 

(±.75) 

6.8 

(  3.1) 

6.8 

(  3.1) 

36 

H4 

(±1.00) 

4.2 

(  1.7) 

4.2 

(  1.8) 

Note:  Standard  deviations  are  given  in  parentheses  in  columns  6  and  8. 
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TABLE  7 

Hypotheses  and  Indifference  Regions  in  Terms  of  *(6) 


Item  Pool 

Cutoff  Propo 

rtions 

Indifference  Region 

Hypothesis 

7T  _ 
u 

Hi 

.395 

.605 

.210 

11 

H2 

.299 

.701 

.402 

H3 

.218 

.782 

.564 

H4 

.154 

.846 

.692 

HI 

.516 

.684 

.168 

12 

H2 

.440 

.760 

.320 

H3 

.337 

.863 

.526 

H4 

.324 

.876 

.552 

HI 

.477 

.723 

.246 

13 

H2 

.375 

.825 

.450 

H3 

.303 

.897 

.594 

H4 

.258 

.942 

.684 

Hi 

.540 

.616 

.076  (.093) 

14 

H2 

.503 

.655 

.152  (.172) 

H3 

.466 

.692 

.226  (.230) 

H4 

.428 

.728 

.300  (.270) 

Note:  Standard 

deviations  for  the 

i  nd i f  f  erence 

regions 

in  ^4  are  given  in 

parentheses  in  column  6. 
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